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Abstract. Let M n =X\-\ hX n be a sum of independent random variables such that X^ K. 1, 

EX k = and EX 2 = <j\ for all k. Hoeffding 1963, Theorem 3, proved that 

P{M„ > nt} < H n (t,p), H(t,p) = (1 + qt/p) p+qt (I - t) q ~ qt 

with 

1 „ of + • • • + a\ 

q = — 5-, p=l-q, a = , < t < 1. 

1 + g a n 

Bentkus 2004 improved Hoeffding's inequalities using binomial tails as upper bounds. Let -f^ = 
EX^/o^ and ny. = EX^/cr^ stand for the skewness and kurtosis of X k . In this paper we prove 
(improved) counterparts of the Hoeffding inequality replacing a 2 by certain functions of 71 , . . . , j n 
respectively x\, . . . , x n . Our bounds extend to a general setting where Xy. are martingale differences, 
and they can combine the knowledge of skewness and/or kurtosis and/or variances of X k . Up 
to factors bounded by e 2 /2 the bounds are final. All our results are new since no inequalities 
incorporating skewness or kurtosis control so far are known. 



1. Introduction and results 

In a celebrated paper of Hoeffding 1963 several inequalities for sums of bounded random 
variables were established. For improvements of the Hoeffding inequalities and related results 
see, for example, Talagrand 1995, McDiarmid 1989, Godbole and Hitczenko 1998, Pinelis 1998- 
2007, Laib 1999, B 2001-2007, van de Geer 2002, Perron 2003, BGZ 2006-2006, BGPZ 2006, 
BKZ 2006, 2007, BZ 2003, etc. Up to certain constant factors, these improvements are close 
to the final optimal inequalities, see B 2004, BKZ 2006. However so far no bounds taking 
into account information related to skewness and/or kurtosis are known, not to mention certain 
results related to symmetric random variables, see BGZ 2006, BGPZ 2006. In this paper we prove 
general and optimal counterparts of Hoeffding's 1963 Theorem 3, using assumptions related to 
skewness and/or kurtosis. 
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Let us recall Hoeffding's 1963 Theorem 3. Let M n = X\ + ■ ■ ■ + X n be a sum of independent 
random variables such that < 1, £1^ = 0, and E A| = cr| for all /c. Write 

2 c? H + &n <y 2 

Hoeffding 1963, Theorem 3, established the inequality 

P{M„ > nt} < fTftp), H(t,p) = (l + q t/p) p+qt (l-t) q - qt (1.1) 
assuming that < i < 1. One can rewrite H n (x,p) as 

H n (x,p) = inf exp{-/mt}E exp{/iT n }, 

where T„ = e\ + • • • + e n is a sum of n independent copies of a Bernoulli random variable, 
say e = e(a 2 ), such that 

F{e = -a 2 } = q, F {e = 1} = p, Ee 2 = a 2 . (1.2) 
Using the shorthand x = nt, we can rewrite the Hoeffding result as 

^\M n > x] < inf e~ hx E e hT ", (1.3) 

/i>0 

In B 2004 the inequality (1.3) is improved to 

F{M n >x}< mf(x-h)- 2 E(T n -h) 2 + . (1.4) 

Actually, inequalities (1.1), (1-3) and (1.4) extend to cases where M n is a martingale or even 
super-martingale, see B 2004 for a proof. In the case of (1.1) and (1.3) this was noted already 
by Hoeffding 1963. 

The right hand side of (1.4) satisfies 

mf(t-h)- 2 E(T n -h) 2 + <^-F{T n >x}, e = 2.718... (1.5) 

h<x 2 

for integer x € Z. For non-integer x one has to interpolate the probability log-linearly, see B 2004 
for details. The right-hand side of (1.4) can be given explicitly as a function of x, p and n, 
see BKZ 2006, as well as Section 2 of the present paper. To have bounds as tight as possible is 
essential for statistical applications, like those in audit, see BZ 2003. 

Our intention in this paper is to develop methods leading to counterparts of (1.1), (1.3) 
and (1.4) such that information related to the skewness and kurtosis 

E(X k -EX k f E(X k -EX k ) 4 . 
Ik = -3 , Xfc = -4 (1-6) 

of Xk is taken into account (in this paper we define 7fc = oo and x/c = 1 if = 0). All our 
results hold in general martingale setting. 



SKEWNESS AND CURTOSIS 



3 



All known proofs of inequalities of type (1.3) and (1.4) start with an application of Cheby- 
shev's inequality. For example, in the case of (1.4) we can estimate 

P{M n > x} < inf (x - h)~ 2 E(M n - h)\ (1.7) 

h<x 

since the indicator function t i— > I{t > x} obviously satisfies I{t > x} < (x — h)~ 2 (t — h)\ for 
all t € M. The further proof of (1.4) consists in showing that E (M n — h)\ < E (T n - h)\ for 
all We would like to emphasize that all our proofs are optimal in the sense that no further 

improvements are possible in estimation of E (M n — h) 2 ^. Indeed, in the special case M n = T n 
the inequality E (M n - h) 2 + < E (T n - h)\ turns into the equality E (T n — h)\ = E (T n - h)\. 

In view of (1.7) it is natural to introduce and to study transforms G i-> Gp of survival 
functions G{x) = P {A" > x} of the type 

Gp(x) = inf (x - h)-^ E (X -hf + , (3 > 0, (1.8) 

h<x 

defining G = G in the case = 0. See Pinelis 1988, 1989, B 2004, BKZ 2006 for related known 
results. 

The paper is organized as follows. In the Introduction we provide necessary definitions and 
formulations of our results, including their versions for sums of martingale differences. In Section 
2 we recall a description of the transform G t— > G2 of binomial survival functions — our bounds 
are given using G 2 - Section 3 contains proofs of the results. 

Henceforth M n = X\ H h X n stands for a martingale sequence such that the differences Xk 

are uniformly bounded (we set Mq = Xq = 0). Without loss of generality we can assume that the 
bounding constant is 1, that is, that Xk < 1. Let J-q C T\ C • • • C T n be a related sequence of 
er-algebras such that are ^-measurable. Introduce the conditional variance sjr, skewness gk 
and kurtosis Ck of Xk by 

s 2 = E{X 2 \ TW), g k =E(xl\T k _ 1 )/sl c k = E | J" fe _i)/4- (1-9) 

Note that s|, 5^, are J^-i -measurable random variables. 

Remark 1.1. We prove our results using (1.4) for martingales. It is proved in B 2004 that 
all three inequalities (1.1), (1-3) and (1.4) hold with a 2 = (a 2 + • • - + a 2 )/n if M n is a martingale 
with differences X k < 1 such that the conditional variances s 2 satisfy s 2 < a\ for all k. 

It is easy to check that Bernoulli random variables e = £(cr 2 ) of type (1.2) have variance a 2 
and skewness 7 related as 



7 = a -, a 2 = u 2 (7) , where u(x) = Jl+^ y. (1-10) 
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Theorem 1.2. Assume that the differences Xk of a martingale M n satisfy Xk < 1, and that 
the conditional skewness gk of Xk are bounded from below by some non-random 7^, that is, that 

9k>lk, k = 1,2,... ,n. (1.11) 

T/ien (1.3) and (1.4) /10/d wii/i T n freing a s«m o/n independent copies of a Bernoulli random 
variable e = e(a 2 ) of type (1.2) with skewness 7 and variance a 2 defined by 

= J_ 7i"(7i) + ---+7n^(7n) a 2 = ^ 2 (7i) + --- + " 2 (7n) q 12 x 

In the special case where all 7^ are equal, 71 = • • • = 7 n = 7, the Bernoulli random variable has 
skewness 7 and variance a 2 = u 2 (7). 

It is easy to see that Bernoulli random variables e = e(a 2 ) of type (1.2) have variance a 2 and 
kurtosis x related as 

x=-^-l + a 2 , 2cj 2 = x + 1 ± a/(x + l) 2 -4. (1.13) 

In particular 

cr 2 <w(x), where 2v(t) = t + I + \J {t + I) 2 — A. (1.14) 

Theorem 1.3. Assume that the differences Xk of a martingale M n satisfy Xk < 1, and that 
the conditional kurtosis Ck of Xk are bounded from above by some non-random x^, that is, that 

Ck < Xk, k = 1,2, . . . ,n. (1-15) 

Then (1.3) and (1.4) hold with T n being a sum of n independent copies of a Bernoulli random 
variable e = e(a 2 ) of type (1.2) with kurtosis x and variance a 2 defined by 

x= — 1 + a , a = , (l-lo) 

<j A n 

where the function v is given in (1.14). In the special case where x\ = ■ ■ ■ = x n = x, the 
Bernoulli random variable has kurtosis x and variance a 2 = v(x). 

The next Theorem 1.4 allows to combine our knowledge about variances, skewness and kur- 
tosis. Theorems 1.2, 1.3 and (1.4), (1.4) for martingales (see Remark 1.1) are special cases of 
Theorem 1.4 setting in various combinations a 2 = 00, 7^ = —00, Xk = 00. 

Theorem 1.4. Assume that the differences Xk of a martingale M n satisfy Xk < 1, and that 
their conditional variances s\, skewness gk and kurtosis Ck satisfy 

sl<o-l, gk>7k, c k < x k , k = l,2,...,n (1.17) 
with some non-random s 2 > 0, gk > —00 and 1 < Ck < 00. Assume that numbers a 2 satisfy 



a\ > min{crfe, n 2 (7 fc ), v(x k )}. 
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Then (1.3) and (1.4) hold with T n being a sum of n independent copies of a Bernoulli random 
variable e = s(a 2 ) of type (1.2) with 

2 _ »!+■••+"« 

n 

where functions u and v are defined in (1.10) and (1.14) respectively. 

Remark 1.5. All our inequalities can be extended to the case where M n is a super- 
martingale. Furthermore, their maximal versions hold, that is, in the left hand sides of these 

inequalities we can replace P {M n > x} by P < max > x > . 

[l<k<n J 

Remark 1.6. One can estimate the right hand sides of our inequalities using Poisson 
distributions. In the case of Hoeffding's functions this is done by Hoeffding 1963. In notation 
of (1.1) his bound is 

H n (t,p) < inf e~ hx E e h ^~ x) = exp (x - (x + A) In JLtA j (1.18) 

where x = tn, A = na 2 , and n is a Poisson random variable with parameter A. It is shown in 
the proof of Theorem 1.1 in B 2004, that if T n is a sum of n independent copies of a Bernoulli 
random variable e = e(a 2 ), then 

mf(t-h)- 2 E(T n -h)\ < mi{t-h)- 2 E{r!-\-h)\, (1.19) 

h<x h<x 

where n is a Poisson random variable with parameter A = na 2 . The right hand side of (1.19) is 
given as an explicit function of A and x in BKZ 2006. 

Remark 1.7. A law of transformation {a 2 , . . . , a^} i-> a 2 in (1.1), (1.3) and (1.4) is a linear 
function. In bounds involving skewness and kurtosis corresponding transformations are non- 
linear, see, for example, (1.12), where the transformation {71, . . . ,7 n } >->• 7 is given explicitly. 



2. An analytic description of transforms G 2 of binomial survival functions G. 



In this section we recall an explicit analytical description of the right hand side of (1.4) 

G 2 (x) = mf(x-h)- 2 E(T n -h)l, 

h<x 

where T n is a sum of n independent copies of the Bernoulli random variable (1.2). The description 
is taken from BKZ 2006. Let G{x) = P {T n > x} be the survival function of T n . The probabil- 
ities p, q and the variance a 2 are defined in (1.2). Write A = pn. The sum T n = S\ + ■ ■ ■ + e n 
assumes the values 

d s = —na 2 + s(l + a 2 ) = - — — , s = 0, 1, n. 
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The related probabilities satisfy 

p n>a =F{T n =d a }= ( n s )q n - s p s . 
The values G(d s ) of the survival function of the random variable T n are given by 

G(d s ) =p n ,s H VPn,n- 

Write 

SPn,s 

l^n ,s 



G(d.) 

Now we can describe the transform G 2 - Consider a sequence 

= r < r 1 < . . . < r n _! < r n = n 
of points which divide the interval [0, n] into n subintervals [r s ,r s+ i]. To define G2 take 

r s = — , s = 0,l,... ,n-l, 

qu n ^ s + A — s 



and 



3. Proofs 

Proof of Theorem 1.2. This theorem is a special case of Theorem 1.4. Indeed, choosing 

cr 2 . = 00, Xfc = 00, fc = 1, 2, . . . , n, 

we have v(xk) = 00. Hence a| from the condition of Theorem 1.4 have to satisfy a 2 > u 2 (^k)- 
We choose af = u 2 (^k)- Then cr 2 = (m 2 (7i) + • • • + u 2 {^ n ))/n. A small calculation shows that 

with such cr 2 the skewness 7 = o of Bernoulli random variables e = s(o 2 ) coincides with 

the expression given in (1.12) □ 

Proof of Theorem 1.3. This theorem is a special case of Theorem 1.4. Indeed, choosing 

Ofc = 00, 7fc = -00, fc = l,2,...,n, 

we have u{jk) = 00. Hence a 2 from the condition of Theorem 1.4 have to satisfy a 2 > v(xk)- 
We choose a 2 , = u(xfc). Then a 2 = (v(x\) + • • • + v{x n ))/n. □ 



In the proof of Theorem 1.4 we use the next two lemmas. 
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Lemma 3.1. Assume that a random variable X < 1 has mean EX = 0, variance s 2 = EX 2 , 

E X^ 

and skewness such that — 5 — > q. Then 

, 2 <n 2 ( ff ), «(*) yJl+^-±. (3.1) 

Proof. It is clear that 

(i + s 2 ) 2 (l -t) > fort<l. (3.2) 

Replacing in (3.2) the variable t by 1 and taking the expectation, we get s 2 — s 4 > EX 3 . 

E X^ 1 

Dividing by s 3 and using — - — > g, we derive — — s > g. Elementary considerations show 
that the latter inequality implies (3.1). □ 



Lemma 3.2. Assume that a random variable X < 1 has mean EX = 0, variance s 2 = EX 2 , 

IE X^ 

and kurtosis such that — -. — < c with some c > 1 . Then 

s 2 < v(c), 2v(t) = t + l + V(t + l) 2 -4. (3.3) 

IE 

Proof. By Holder's inequality we have — 3 — > 1. Hence, the condition c > 1 is natural. The 

function v satisfies v(c) > 1 for c > 1. Therefore in cases where s 2 < 1, inequality (3.3) turns to 
the trivial s 2 < 1 < v(c). Excluding this trivial case from the further considerations, we assume 
that s 2 > 1. Write a = 2a 2 — 1. Then a > 1. It is clear that 

(t + s 2 ) 2 (l -t)(a-t) > fort<l. (3.4) 

Replacing in (3.4) the variable t by X and taking the expectation, we get EI 4 > s 2 — s 4 + s 6 . 

IE X^ 1 

Dividing by s 4 and using — j — < c, we derive — — 1 + s 2 < c. Elementary considerations 
show that the latter inequality implies (3.3). □ 

Proof of Theorem 1.4. The proof starts with an application of the Chebyshev inequality 
similar to (1.7). This reduces the estimation of P{M„ > x} to estimation of expectations 

E exp{/iM n }, E (M„ - h)\. 

As it is noted in the proof of Lemma 4.4 in B 2004, it suffices to estimate E (M n — h) 2 ^ since 
the desired bound for the other expectation is implied by 

E(M n -h) 2 + <E(T n -h) 2 + . (3.5) 

Let us prove (3.5). By Lemma 3.1 the condition g^ > 7fc implies s 2 < u 2 {~jk)- While applying 
Lemma 3.1 one has to replace X by X^, etc. In a similar way, by Lemma 3.2 the condi- 
tion Cfc < >c k implies s\ < v(x k ). Combining the inequalities and the assumption s 2 < a k , we 
have 

s 2 < min{cr 2 , u 2 (j k ), v{x k )}. (3.6) 

The inequality (3.6) together with the condition of the theorem yields s k < a 2 . As it is shown 
in the proof of Theorem 1.1 in B 2004, the latter inequality implies (3.5). □ 



8 



V. BENTKUS 



References 

[B] Bentkus, V., On measure concentration for separately Lipschitz functions in product spaces, Israel. J. Math. 
158 (2007), 1-17. 

[B] Bentkus, V., On Hoeff ding's inequalities, Ann. Probab. 32 (2004), no. 2, 1650-1673. 

[B] Bentkus, V., An inequality for tail probabilities of martingales with differences bounded from one side 16 
(2003), no. 1, 161-173. 

[B] Bentkus, V., A remark on the inequalities of Bernstein, Prokhorov, Bennett, Hoeff ding, and Talagrand, Lith. 

Math. J. 42 (2002), no. 3, 262-269. 
[B] Bentkus, V., An inequality for tail probabilities of martingales with bounded differences, Lith. Math. J. 42 

(2002), no. 3, 255-261. 

[B] Bentkus, V., An inequality for large deviation probabilities of sums of bounded i.i.d. random variables, Lith. 

Math. J. 41 (2001), no. 2, 112-119. 
[BGZ] Bentkus, V., Geuze, G.D.C., and van Zuijlen, M., Optimal Hoeff ding-like inequalities under a symmetry 

assumption, Statistics 40 (2006), no. 2, 159-164. 
[BGZ] Bentkus, V., Geuze, G.D.C., and van Zuijlen, M., Unimodality: The linear case, Report no. 0607 of Dept. 

of Math. Radboud University Nijmegen (2006), 1-11. 
[BGZ] Bentkus, V., Geuze, G.D.C., and van Zuijlen, M., Unimodality: The general case, Report no. 0608 of 

Dept. of Math. Radboud University Nijmegen (2006), 1-24. 
[BGPZ] Bentkus, V., Geuze, G.D.C., Pinenberg, M.G.F., and van Zuijlen, M., Unimodality: The symmetric case, 

Report no. 0612 of Dept. of Math. Radboud University Nijmegen (2006), 1-12. 
[BKZ] Bentkus, V., N. Kalosha, and van Zuijlen, M., Confidence bounds for the mean in nonparametric multi- 
sample problems, Statist. Neerlandica 61 (2007), no. 2, 209-231. 
[BKZ] Bentkus, V., N. Kalosha, and van Zuijlen, M., On domination of tail probabilities of (super)martingales: 

explicit bounds, Lith. Math. J. 46 (2006), no. 1, 1-43. 
[BZ] Bentkus, V., and van Zuijlen, M., On conservative confidence intervals, Lith. Math. J. 43 (2003), no. 2, 

141-160. 

van de Geer, S. A., On Hoeff ding's inequalities for dependent random variables, Empirical process techniques 
for dependent data, Birkhauser Boston, Boston, MA Contemp. Math., 234, Am. Math. Soc, Providence, RI, 
2002, pp. 161-169. 

Godbole, A., and Hitczenko, P., Beyond the method of bounded differences, Microsurveys in discrete probability 
(Princeton, NJ, 1997), Dimacs Ser Discrete Math. Theoret. Comput. Sci., Amer. Math. Soc, Providence, RI 
41 (1998), 43-58. 

Hoeffding, W., Probability inequalities for sums of bounded random variables, J. Am. Statist. Assoc. 58 (1963), 
13-30. 

Laib, N., Exponential-type inequalities for martingale difference sequences. Application to nonparametric regres- 
sion estimation, Comm. Statist. Theory Methods 28 (1999), 1565—1576. 

McDiarmid, C, On the method of bounded differences, Surveys in combinatorics, 1989 (Norwich 1989), London 
Math. Soc. Lecture Note Ser., vol. 141, 1989, pp. 148-188. 

Perron, F., Extremal properties of sums of Bernoulli random variables, Statist. Probab. Lett. 62 (2003), 345-354. 

Pinelis, I., Toward the best constant factor for the Rademacher- Gaussian tail comparison, ESAIM Probab. Stat. 
11 (2007), 412-426. 

Pinelis, I., Inequalities for sums of asymmetric random variables, with applications, Probab. Theory Related 

Fields 139 (2007), no. 3-4, 605-635. 
Pinelis, I., On normal domination of (super)martingales, Electron. J. Prabab 11 (2006), no. 39, 1049-1070. 
Pinelis, I., Fractional sums and integrals of r-concave tails and applications to comparison probability inequalities, 

Advances in stochastic inequalities (Atlanta, GA, 1997), Contemp. Math., 234, Am. Math. Soc, Providence, 

RI, 1999, pp. 149-168. 

Pinelis, I., Optimal tail comparison based on comparison of moments, High dimensional probability (Oberwolfach, 

1996), Progr. Probab., 43, Birkhauser, Basel,, 1998, pp. 297-314. 
Talagrand, M., The missing factor in Hoeff ding's inequalities, Ann. Inst. H. Poincare Probab. Statist. 31 (1995), 

no. 4, 689-702. 

Vilnius institute of mathematics and informatics, Akademijos 4, LT-08663 Vilnius 
E-mail address: bentkus@ktl.mii.lt, ????????????? 



